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SUMMARY 


A likelihood  ratio  statistic  is  proposed  for  testing  goodness  of 

fit  with  grouped  data  which  are  subject  to  random  right  censoring.  It 

is  shown  that,  under  appropriate  conditions,  this  statistic  has  an 

asymptotic  chi-square  distribution  which  is  non-central  under  contiguous 

alternatives.  Some  examples  are  given  including  one  on  marijuana  usage 
which  needs  an  extension  of  the  test  to  the  doubly  censored  case. 

Some  key  words:  Likelihood  ratio;  Goodness  of  fit  test;  Grouping;  Random 

censoring;  Multinomial  distribution;  Kaplan-Meier  product  limit  estimator J 
Self-consistency;  The  EM  method;  Double  censoring. 


1.  INTRODUCTION 

In  this  paper  we  consider  the  problem  of  testing  "goodness  of  fit" 
when  some  of  the  data  may  be  subject  to  random  censoring.  Single  right 
censoring  occurs  commonly  in  response  time  data.  Here  each  lifetime  X 
may  be  observed  exactly  or,  alternatively,  may  be  known  only  to  exceed  a 
certain  value.  These  situations  occur,  for  instance  in  industrial  life- 
testing, medical  follow-up  and  recidivism  studies.  Some  examples  are  given 
in  Kaplan  and  Meier  (1958).  We  shall  concentrate  on  this  case  of  single 


censoring  but  there  are  obvious  extensions  of  the  methods  we  shall  propose 
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to  more  complicated  censoring  patterns  such  as  interval  or  double  censoring 

(see  respectively  Peto  (1973),  and  Turnbull  (1974,  1976)). 

The  question  of  whether  the  observations  can  be  explained  by  a particular 

mathematical  model  is  an  important  problem.  For  instance,  if  an  exponential 

model  can  be  accepted,  then  further  analysis  - e.g.  estimation  and  testing  - 

is  simplified  considerably.  In  life-testing  it  becomes  meaningful  to  employ 

Standards  like  MIL  STD  690B  and  MIL  STD  781  B,  which  assume  a constant 

failure  rate.  If  a goodness-of-f it  test  can  lead  the  investigator  to 

accept  a certain  parametric  mathematical  or  physical  model,  then  this  can 

enable  him  to  glean  some  information  about  the  tail  of  the  response  time 

distribution,  which  is  often  important  in  reliability  studies.  Non-parametric 

methods  usually  reveal  little  about  the  tail  behaviour. 

We  will  consider  the  following  random  censorship  model.  There  are 

N pairs  of  random  variables  (X, ,Y. ),  (X0,Y„). . . (X.Y).  Usually  these 

11  t l N N 

represent  response  times.  The  observed  data,  however,  consist  only  of 

min(X.,Y.)  and  I,  _ v , for  1 < i < N.  (Here  I.  denotes  the  indicator 

l—i 

of  the  set  A. ) If  I . , = 0,  we  say  that  X.  has  been  censored 

i — i 1 

by  Y^,  otherwise  X^  has  been  observed  exactly.  Assume  that  X^X^,...^ 
are  i.i.d.  with  survivor  function  F(x)  = P(X  > x),  and  similarly 
Y ,Y  ,...tYN  are  iid  with  survivor  function  G(y)  = P(Y  > y).  Assume  also 
that  X^  and  Y^  are  independent  (1  <_  i <_  N);  without  this  assumption 
there  are  identif iability  problems  (Tsiatis,  1975).  Both  F and  G are 
unknown.  Here  G is  a "nuisance  parameter"  and  the  goal  is  to  make 
inferences  about  F,  specifically  to  test  some  null  hypothesis  HQ:  F = FQ. 

Hq  may  be  simple  or  composite,  i.e.  FQ  may  be  completely  specified  or 
may  depend  on  some  parameters  which  are  left  unspecified. 


t 
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Goodness  of  fit  analysis  is  substantially  complicated  by  the  presence 
of  censoring,  and  most  researchers  have  only  considered  the  singly  censored 
case.  Among  graphical  methods.  Nelson  (1972)  describes  methods  for 
plotting  the  cumulative  hazard  function,  while  Barlow  and  Campo  (1975) 
have  proposed  "total  time  on  test"  plots.  In  the  analysis  of  heart 
transplant  survival  data  Turnbull,  Brown  and  Hu  (1974)  also  used  graphical 
methods  to  compare  the  best  fitting  exponential  and  Pareto  model  curves 
with  the  product-limit  estimate  curve  as  defined  by  Kaplan  and  Meier 
(1958).  Lamborn  (1969)  described  a Pearson  type  statistic  and,  extending 

the  methods  of  Roy  (1956)  and  Watson  (1958),  she  established  that  this  is 

, 2 
asymptotically  distributed  as  a linear  combination  of  independent  x ^ 

random  variables.  The  test  is  difficult  to  use  in  practice  because  this 

asymptotic  distribution  is  different  for  each  problem.  Greenberg,  Bayard 

2 

and  Byar  (1974)  also  used  a Pearson  x statistic  but  obviated  the 
difficulties,  because  for  each  censored  obervation,  it  was  known  into  which 
class  interval  it  fell.  Barr  and  Davidson  (1973)  described  a Kolmogorov - 
Smirnov  (KS)  test  for  data  which,  if  censored,  are  censored  at  the  same 
fixed  point  (i.e.  = situation  is  common  in 

life-testing  but  not  in  medical  follow-up  or  recidivism  data.  The 
asymptotic  distribution  of  the  Barr-Davidson  statistic  has  been  obtained  and 
tabulated  by  Koziol  and  Byar  (1975).  Cramer-von  Mises  (CM)  type  statistics 
have  been  investigated  by  Pettit  and  Stephens  (1976)  for  the  case  with  all 
Y.  equal  (they  also  consider  double  censoring);  and  by  Koziol  and  Green 
(1975)  for  the  particular  model  G = under  HQ  for  some  & > 0.  In 

practice  this  assumption  must  be  verified  first  and  B estimated.  A 
limitation  of  these  KS  and  CM  type  tests  is  that  HQ  must  be  simple, 
i.e.  Fq  completely  specified.  Finally,  Barlow  and  Proschan  (1969)  have 
described  a test  for  the  exponential  model  which  is  unbiased  against  IFRA 
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alternatives  (see  also  Harris  (1976)).  Stollmack  and  Harris  (1974)  have 
applied  this  last  test  to  the  analysis  of  recidivism  data. 

In  this  paper  we  present  a likelihood  ratio  statistic  for  testing 
goodness  of  fit  of  hypothesis  Hq,  which  may  be  simple  or  composite,  and 
is  applicable  when  the  ranges  of  the  random  variables  X,Y  are  discrete 
or  when  the  observations  are  grouped  into  discrete  intervals.  Using  the 

general  results  of  Weiss  (1975)  concerning  the  likelihood  ratio  in  non- 

. . . 2 
standard  cases,  the  statistic  is  shown  to  have  an  asymptotic  x distribution. 

Specifically  we  assume  that  X is  discrete  with  finite  range 

t,  < t„<  ...  < t , which  occurs,  for  instance  in  response  time  data 

12m 

if  there  were  a natural  discrete  time  scale,  e.g.  see  Klotz  (1976). 


Alternatively  we  can  assume  that  the  data  are  grouped  and  the  lifetimes  re- 
corded only  as  belonging  to  one  of  the  m intervals  (t^t^],  (t^ ,t2 J , • • • , 

(t  . ,t  ],  where  usually  tn  = 0 and  often  t = + ®.  Further  we  assume 
that  the  range  of  Y is  { t^+,t  t , . . . ,t  +) , i.e.  any  right  censored 
observation  (or  "losses")  at  t^  occur  immediately  after  any  observed 
deaths.  This  setup  is  often  appropriate  in  studies  with  periodic  inspection 
(or  "snapshots")  - see  the  discussion  in  Section  1.4  of  Kaplan  and  Meier 
(1958).  Essentially  the  two  problems  (X  discrete,  or  X continuous  with 
grouped  observations)  are  the  same  with  slight  modifications,  but  it  is 
easier  to  think  of  X taking  on  discrete  values  and  we  shall  do  this  in 
our  further  discussion.  In  Section  3 we  will  give  examples  of  both  situations. 
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2.  THE  LIKELIHOOD  RATIO  TEST 


For  1 < 1 < m,  define  F.  = F(t.)  = P(X  > t.),  s.  = F.  - F., 

— — l l i i l-l  l 

m m 

G.  = G(t .+)  = P(Y  > t.  + ),  u = G - G , with  £ s = £ u.  = 1, 

! i 111-11  i=l  1 i=l  1 

F = G =0.  Following  the  notation  of  Kaplan  and  Meier  (1958),  we  define 
mm 

frequencies  SpX^  (1  <_  i <_  m)  where  there  are  6^  pairs  with  X = t^ 

and  Y > t.  (exact  observations  at  t.)  and  X.  observations  with 
l ii 

X > t.  and  Y = t.+  (losses  at  t.).  Note  that  since  F =0,  we  must 
ii  l m 

have  X =0.  (The  assumption  F =0  is  no  loss  of  generality;  if 


F > 0,  X >0  we  can  add  an  extra  "bin"  or  time  point,  t , say.)  Note 
mm  r m+1  J 

m 

also  that  T (6.  + X.)  = N. 
i=l  1 1 


Define  s = (s,,...,s  ,),  u = (u,,...,u  ,).  Then  the  likelihood 

'v  1 m-1  1 m-1 

L is  given  by 


6 m-1  6. 

L(s,u)  = [u  s ] m n [a.(l  - u.  - u.  )]  1 [u.(l  - s..  - ...  - s 

'v  mm  . , l l l-l  l 1 

i=l 

(1) 

where  s = 1 - s,  - ...  - s _ and  u = 1 - u,  - ...  u Define 

m 1 , m-l  m 1 m-1 

m-l  i 

n = (s:  s.  > 0,  l s.  < 1}  C R™  . Kaplan  and  Meier  (1958)  show  that 

* i - i ~ ~ 

L is  maximised  in  Q by  s=s=(s,,...,s  ,),  where  s.  = F.  , - F.  ai 

A,  A.  1 m-l  i l-l  i 

a A rn  a 

F = F.  ,[1  - (6./  y (6.  + X.)}]  (1  < i < m)  with  F = 1.  (; 

l l-l  1 j=i  11  “ ~ u 

Suppose  we  wish  to  test  the  hypothesis 

H. : s . = Q.(b)  for  1 < i < m-1 

for  some  unspecified  $ = ($x 40  e R*  where  1 < a < m-2  and  the 

{Q.}  are  given  functions  from  Ra  into  Rm 

We  assume  that  s^  = Q^(£)  ^or  ^ 1.  — a defines  a 1-1 

relation  between  £ and  (®1»82» • • • *aa) • We  can  cater  for  the  case  of 
H simple  i.e.  when  the  s.  are  completely  specified  by  formally  allowing 


the  case  a = 0.  For  example  if  we  wish  to  test  that  X is  geometrically 
distributed  we  have  Q^($)  = 4>(1  -41)1  ^ for  1 < i <_  m-1.  Here  a = 1 if 
4>  is  unspecified,  while  a = 0 if  4>  is  specified  by  HQ. 

The  hypothesis  HQ  implies  that  ^ belongs  to  some  subspace, 
say,  of  the  parameter  space  fi.  Let  jj>  denote  the  MLE  of  £ under  HQ 

f\j  f\, 

and  thus  ^ = Q(^>)  will  maximise  the  likelihood  (1)  under  the  restriction 
^ e The  problem  of  calculating  £ and  s can  be  quite  difficult 

computationally  (except  of  course  when  HQ  is  simple,  a = 0)  and  there  are 
many  papers  in  the  literature  concerned  with  estimating  parameters  with 


grouped  and  censored  data  in  special  cases  e.g.  Weibull  or  Poisson; 
perhaps  the  best  general  references  are  to  Blight  (1970)  and  to  Dempster, 


Laird  and  Rubin  (1976).  Once  we  have  obtained  s,  we  can  form  the 
generalised  likelihood  ratio: 


Jfc  eSM  e Q0 

max  r , . 

>d»  %eSJ  ^ 


* 

m /&.\  * m-1  / 1— s — • 

n Li  n 


i=1  \si 


i=l  \i-S;L-. 


In  (3),  we  have  defined  s = 1 - s,  - ...  - s , and  similarly  s . 

m 1 m-1  J m 

that  does  not  depend  on  u. 

Further,  define  <4  .,6  , by  the  relations 

a+1  a+2  m-1  J 


si  = ♦ • + 


for  i = a+1, . . . ,m-l. 


T 


We  have  now  reparametrized:  our  new  parameters  are  ui,U2’ ' ‘ ’ ,Um 

tp  ,<p  . Of  these  u ,u0,...,u  1 ,4  are  nuisance  parameters 

x *.  m-x  x /.  ro-x  x a 

and  the  hypothesis  HQ  which  we  wish  to  test  becomes 


V '•’a+l  = *3+2 


• Let  0°  denote  a particular  configuration  (u^,...,u^ 

4> ° » 0 , . . . 0) . We  assume  that  throughout  a neighborhood  of  (<f>®  . . . ,<f>°) , 

3 X 3 

all  third  derivatives  of  Q.(<p <p  ) are  bounded  in  absolute  value,  and 

X X 3. 

that  throughout  a neighborhood  of  0°,  the  2in  quantities  u, ,...,u  ,s  ,...s 

'v  1 m 1 m 

are  all  bounded  away  from  zero.  Here  and  in  Appendix,  when  we  speak  of  a 
point  ^ = (u^, . . . . . . >^m_i^ » it  is  understood  that  s^  is  given 

by  Q.(  4>  ...,<)>  ) for  i = l,...,a;  by  $>.  + Q.  ,<t>)  for  i = a + 1,..., 

1X3  1 1 JL  3 

m-l:  and  s = 1 - (s  + ...  + s ,).  Then  when  6^  is  the  true  parameter 
m _ m-l  t r 

vector,  - ^ a~u  3u~~  tends  stochastically  to  a limit,  V (0°)  say, 

a 8 .,0 


_ i.  1 tends  stochastically  to  a limit,  ^o^0)  say,  for  l<a<m-l, 

N oik  3<j>  1 aP  “ 

a 6 


1 < 0 < m-l.  These  define  (m-l)x(m-l)  matrices  V(^)  and  JjJ(0°).  The 
details  of  this  and  expressions  for  the  {V  } and  {W  } are  given  in 

dp  01(5 

the  Appendix.  We  are  now  ready  to  state  the  following: 

THEOREM:  Suppose  V(e°)  and  W(e°)  are  both  non-singular,  and 

'It  'Xj  'V  \ 

consider  the  contiguous  alternative  where  the  true  parameter  values  are 

(u°....,u°  ,,<t? * 0 ,c  , //5T, ...,c  Then  the  asymptotic  distribution 

1’  m-l  Tl*  a a+1  m-l 

2 2 2 
of  -21og  A.,  is  X , (6  ).  Here  the  non-centrality  parameter  6 is 
N m-a  *■  x 


1 


<c  c .)  Z (0  )(c  , • • ,c  )’ 

3+X  m — x ^ a+x  m-x 


where  Z(0  ) denotes  the  (m-a-1) x(m-a-l)  matrix  derived  by  deleting  the 
,v»  \ 


The  proof  is  given  in  the  Appendix. 


REMARK  1.  The  asymptotic  distribution  of  -21og  A^  under  HQ  is 

2 

Xm-a  1’  1S  as  roight  be  expected.  Thus  knowledge  of  V,W,  or  Z is 

not  required  to  carry  out  a test  of  significance.  An  asymptotically  level 

a test  of  H will  reject  when  -21og  A.,  exceeds  the  (1  - a)  quantile 
U N 

2 

of  the  x , distribution. 
m-a-1 


REMARK  2.  Generally,  the  matrices  V(0  ),  1^(0  ) are  non-singular  as 

long  as  it  is  known  that  none  of  the  true  {s.}  or  {u.}  (1  < i < m)  are 

x l — — 

zero.  However  if  it  is  known  that  some  of  these  parameters  must  be  zero, 

then  the  analysis  can  go  through  with  the  appropriate  reduction  in 

dimensionality.  For  instance,  as  in  the  second  example  of  the  next  section, 

the  experiment  may  be  designed  so  that  all  but  a certain  number,  b say, 

of  the  u.  (1  < i < m)  are  constrained  to  be  zero.  These  zero  u.  are 
i - — i 

just  omitted  from  0°  and  L;  ^ is  a (b-l)x(b-l)  matrix,  and  otherwise 
the  statement  of  the  theorem  is  unchanged. 

REMARK  3.  The  contiguous  or  "challenging"  alternatives  in  the  theorem 
are  commonly  used  in  large  sample  theory  in  order  to  keep  Type  I and  Type  II 
error  probabilities  bounded  away  from  nought  and  one.  For  finite  sample  sizes, 
we  should  modify  our  test  as  in  Weiss  (1975)  to  guard  against  non-contiguous 
alternatives.  We  do  this  by  also  rejecting  HQ  if,  for  any  i (a+1  <_  i <_  m-1) 


|s.  - 0.(t*)|  > k • N_A 

where  k > 0,  2/3  < A < 1 and  (when  a / 0)  e Ra  satisfies 

Si  = Qj (£*)  (1  <_  i _<  a ).  This  modification  will  not  affect  the  asymptotic 


properties  of  the  test  under  II  or  contiguous  alternatives. 

3.  EXAMPLES 

Example  1.  Suppose  we  consider  the  geometric  example  mentioned 
earlier.  Here 


H : s.  =4(1  - 4)1  1 (1  < i < m-1)  s = (1  - 

u i — — m 


for  some  0 <_  <J>  <_  1 unspecified.  Such  an  HQ  would  be  applicable  to  test- 
ing a constant  hazard  rate  with  t,t  ,...,t  , all  equally  spaced. 

12  m-1 

(6  represents  those  items  still  alive  at  t ,.) 
m m-1 


Under  HQ,  the  likelihood  is  proportional  to 


m 6.  m-1  iX . 

L*(<J>)  = n [♦(!  - n (1  -♦)  1 . 


Setting  the  derivative  of  L* ( <#> ) equal  to  zero,  we  see  that  the  maximising 


value  of  $ is 


I 5./  Cl  j(X.  + 6.)].  Then  we  can  obtain  the 


i=l  1 j=l  3 3 


likelihood  ratio  by  substituting  in  (3)  for  s by  (2)  and  for 

'Xa  *\j  i — 1 

s.  by  9(1  - <)>)  (1  <_  i <_m-l).  Here  a = 1,  and  so  the  cutoff  point 

2 

will  be  based  on  the  percentage  points  of  the  X distribution.  (Of 

. 'V 

course,  if  HQ  also  specified  the  value  of  <J>  we  would  take  <f>  as  that 

value  and  then  a = 0). 


Example  2.  Consider  the  following  grouped  data  taken  from  Table 


465  of  Kaplan  and  Meier  (1958). 


a 


t 


r 

.4! 


I 

I 

i 

I 
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TABLE  1. 


i 

1 

2 

3 

4 

5 

6 

7 

8 

t . 
1 

1 

1.7 

2 

3 

3.6 

4 

5 

T 

6. 

1 

3 

5 

4 

10 

9 

6 

15 

16 

A . 
1 

0 

20 

0 

0 

12 

0 

0 

0 

Take  t = 0,  T > 5.  Thus  m = 8 and  N = 100.  Here  we  could  say  that 
losses  could  only  occur  at  times  1.7  and  3.6  because  of  the  design  of  the 
experiment.  (Actually,  the  deaths  at  T were  really  reported  as  losses 

at  5+ . Clearly  it  makes  no  difference  how  we  apportion  the  16  losses  at 
time  5+  between  6 and  X . ) By  Remark  3 however,  the  fact  that  only 

u ,u,  and  u can  be  non-zero  does  not  affect  the  test.  In  Table  1, 

2.  o o 

4L  is  interpreted  as  the  frequency  of  observed  response  times  occurring 
in 


Consider  the  hypothesis  HQ  that  X is  exponentially  distributed. 


Thus 


-0t.  . -8t. 

l-l  1 

V si  = e 


(1  < i < m-1). 


Under  HQ,  the  likelihood  is  proportional  to 


m -<j>t.  ~4>t.  6.  — 4>t . A . 

L*(4> ) = H [(e  1-1  - e 1 e 1 1 ] 

i=l 


where  we  have  used  the  convention  t = +°°  and  t X =0.  Setting 

m mm 


31ogL" 


= 0,  we  have  that  4 satisfies 

t.  exp(-4>t.  ,)  + t.  exp(-<H.! 
1 r l-l  1 1 


m-1  m 

l = 1 &- 


i=l 


i=l 


exp  ( -<|> t ^ _ ) - exp(-ij>t.) 


'X,  ' 'V 

Solving  numerically  we  obtain  4>  = .1638  and  using  a 


exp(-$t.  ^ )-exp(-$t^ ) 


. -M 


* 


;****>  - 


^ ■«**£&*  <■■****'•  imtMf  s^w^W^-v 


we  have  s = (.151,  .092,  .036,  .109,  .057,  .035,  .078,  .441).  From 
Kaplan  and  Meier  (p.  465)  we  have 


s = (0.3,  .05,  .05,  .13,  .11,  .11,  .25)  and  s = .27. 

'u  m 


Fvn 1 u j t ing  -21og  A (s,s)  = 46.1,  and  comparing  this  with  the  tables  of 
Xg»  we  see  that  is  rejected  at  any  reasonable  significance  level. 

With  a little  more  numerical  work,  one  could,  in  a similar  fashion, 

investigate  the  hypothesis  that  X was  Weibull  or  Pareto  distributed,  for 

6 6 

- xample.  In  the  Weibull  case  = exp[-(nt^  ^)p]  - exp[-(nt^)P]  whore 

R R 

H , while  under  the  Pareto  model  s.  = [p/(n  + t.  ,)]-[(n/(n  + t.)]P 

0 11-1  i 

whore  P - (n,il)  is  unspecified.  In  both  cases  a = 2,  and  the  percentage 
points  of  x2s  are  applicable. 

Example  3.  (Doubly  censored  data).  The  techniques  described  in  this 
paper  can  ue  extended  analogously  to  handle  doubly  censored  data.  Here 
in  addition  the  frequencies  {6^}  and  {A.},  there  are  frequencies  {y.}, 
where  p (1  i <_m)  represents  the  number  of  observations  left  censored 
at  i..  (For  a more  detailed  discussion  of  double  censoring  see  Turnbull 

( i *74). ) As  an  example  consider  Table  2 below  which  summarizes  the  answers  of 
: ’•!  1 ' i.  : • '-mi  i high  school  students  to  the  question  "When  did  you  first  use 
• a' ■']  ..•.ns?"  (Tin's  was  part  of  a large  study  on  the  Stanford-Palo  Alto  Peer 
-tinseling  Program,  which  has  been  reported  by  Hamburg,  Kraemer  and  Jahnke 
(1975).)  Any  direct  answer  such  as  "12,14,15,..."  gives  rise  to  an 
•■x.jct  oDservation.  If  the  student  answered:  "I  have  never  used  it"  then 

this  gives  "ise  to  an  observation  which  is  censored  on  the  right  at  his/her 
rr<‘srr.t  age.  The  final  possibility  was  someone  who  answered  "I  have  used 
it  hut  cannot  recall  just  when  the  first  time  was."  This  gives  rise  to  a 
I of  * censored  observation  where  age  of  first  use  is  known  only  to  be 
pint-:  to  the  Cudent'-.  current  ap,e . 


6. 

i 

4 

12 

19 

24 

20 

13 

3 

1 

0 

1 4 

i 

0 

0 

2 

15 

24 

18 

14 

6 

0 

0 

Ui 

0 

0 

0 

1 

2 

3 

2 

3 

1 

0 

Mow  many  of  the  frequencies  in  the  table  are  zero  or  very  small 
2 

o.  so  the  asymptotic  X distribution  is  unlikely  to  be  a meaningful 
approximation  to  the  distribution  of  the  likelihood  ratio.  Nevertheless 
we  shall  proceed  in  order  to  show  how  the  calculations  are  carried  out 
in  the  doubly  censored  case. 

Let  us  consider  the  problem  of  testing  goodness  of  fit  of  a negative 
binomial  distribution:  (This  hypothesis  was  suggested  to  us  by  one  of  the 
investigators  and  is  based  on  a model  in  which  opportunities  to  start 
taking  drugs  occur  at  random  times  and  that  different  children  have 
varying  susceptibilities.  This  theory  leads  to  mixture  of  Poissons  of 
which  the  negative  binomial  is  a particular  example.)  Thus 


(The  empty  product  is  defined  to  be  unity. ) 

Using  the  method  of  self-consistency  (Turnbull,  1974 ) or  equivalently 

% Of 

the  EM  algorithm  (Dempster  et  al.,  1976)  one  obtains  n = 6.75,  6 = 0.97 

s - (.010,  .033,  .064,  .092,  .111,  .118,  .114,  .103,  .088,  .267) 


Noting  that  there  is  now  an  extra  factor 


in  the  expression  (3)  for  due  to  the  left  censored  observations  we 

calculate  the  value  of  -21ogAN  to  be  32.5. 

2 

Comparing  this  with  the  percentage  points  of  x?  * we  see  that  the 
negative  binomial  does  not  fit  the  observed  data,  (that  is  assuming  that 
the  asymptotic  approximation  is  valid,  which  assumption  we  make  only  for 
pedagogic  reasons). 
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APPENDIX 


This  appendix  is  devoted  to  a proof  of  the  Theorem.  The  proof 
consists  of  a verification  that  the  assumptions  of  Weiss  (1975)  hold  in  the 
present  case. 

Fix  a value  A in  the  open  interval  (0,  i),  and  define  J„(e°)  as 

D N % 


K -u-i  i i = 1,...,  m-l 


(Ul’**”%-l»*l»“-»*in-l)  l*i‘*i'  1 *^=r 


1/6  - A 


1 ~ 1 • • • )d 


t> . | < n -'•/k  - ^ 

T - *7= , j = a+l, . . . ,m-l 

/N 


It  is  easily  verified  that 


1 a 

N 3u  3 <J> 
a 8 


1 9 

N 3u  3u 
a 6 


1 3 

N 3^3^ 


logL(9)  = 0 for  a, 8 = l,...,m-l; 


m 6 


m-l 


losUV  ‘ ih  \*}  + 1 r 

iogL<«)  1 j,  t r*.,8,i<«)  * iL  41 


for  a , 8 = 1 , . . . ,m-l ; 


for  a, 8 = 1* . . . ,m-l; 


Where  ra,8,i(^’  Da,8,i(^’  ra,8,i(^’  d°  n0t  depend  °n  N’ 

and  are  rational  functions  of  and  first  and  second  derivatives  of 

{Q. ($...,$  )},  all  the  denominators  being  first  and  second  powers  of 

1 1 3 

sums  of  one  or  more  of  (u  ).  Thus  these  denominators 

1 ml  m 
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are  bounded  away  from  zero  if  the  derivatives  are  taken  for  0's  in 


vA 


Define  v (£  ) as 


T ts  ia  - rBi  + y t.;«  - °o,6>i<«0> 

1=1  1=1 


+ (u°  s°)r  (e°) 

m m ct ^ 


and  w rt(0  ) as 
aB  ^ 


jx  «•;<!  - ’ ^<-1 v6,i<o 


0 * ,„0, 


+ (u°  s°  ) r”  (0°) 
m m a,B,m  ^ 


for  a,  6 = 1, . • . ,m-l. 

* 

Suppose  the  true  parameter  point  depends  on  N,  and  is  0 (N)  in 

JN(0°)  . Define  o^N)  as  j-~  - s.(8*(N))  [(1  -^(6%))-. . ,-u  ._1(^(N))]|  , 

i A A i A 

for  i = 1, . . . ,m-l,  o (N)  as  hr  " u (0  (N))s  (0  (N))  , o.(N)  as 
^ m N m % m ^ l 

( ~ - u.(0*(N))  [(1  - s^jTdO)- s.($[N))]  | for  i = l,...,m-l. 

_ A 

By  the  properties  of  the  multinomial  distribution,  «/n  ck(N),  i^N  cr(N) 
are  all  finite  with  probability  one.  It  follows  that  we  have 


As*(l-u»...-uO_l)+  -.(N) 


/ 1/6- A' 

A N 

m 0 0 - , . 

-rr  = u„  s + o (N)  rrr 
N mm  m / N 


4 ■ - A •!>  * 5>> 


1/6  -A 


(i  = 1, . . . ,m-l) 


1/6  - A\ 


(i  = 1, — ,m-l) 
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where  P * (1  o.(N) 

6 00 


< 5(0°)  n |S*(n)|  <*({)■,  an  i)  > i - V2°K 


--  finite,  and  lim  qN<6°)  = 0.  Also,  by  our  assumptions 


where  o(6  ) is 


Nx» 


about  Q.  (ij^* - - • Ta 


we  have,  for  any  0(N)  in  that 

„ U&  - A 


' ra,6,i(«0>l  < 5^0)  TiF 


1/6  - A 


lco,,B,i(l(N))  - d».b.«0)|  ‘ "<“0>  7t 


0 , N 


1/6  - A 


|r*  .(5(H))  - r o B .(^)l  < > — 

'a, 6,i  ^ a,p,i  / N 


1/6  - A 


* i(®0)  sw 


for  all  a,6,i,  where  q(0  ) is  finite. 

The  verification  that  the  assumptions  of  Weiss  (1975)  holds  is  now 

immediate : the  matrix  W°>  in  Weiss  (1975)  is  the  2(»-l>  by  2(»-D 


matrix  written  in  partitioned  form  as 

m°>  i 


\ 


n0) 


The  continuity  of  £(6°)  follows  from  our  assumptions  about  {Q.C^ *a)}  ’ 

and  the  positive  definiteness  from  the  nonsingularity  of  J£(fc  ) and  )- 

/ — 1/6  - A 

and  M . (N)  of  Weiss  (1975)  are  in  our  present  case  /N, 


K.(N) 

i 


respectively . 


m'*  »-  miu 
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